{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AVv_M1Dz9TDz"
      },
      "source": [
        "# Implementing semantic cache to improve a RAG system with FAISS.\n",
        "\n",
        "_Authored by:[Pere Martra](https://github.com/peremartra)_\n",
        "\n",
        "In this notebook, we will explore a typical RAG solution where we will utilize an open-source model and the vector database Chroma DB. **However, we will integrate a semantic cache system that will store various user queries and decide whether to generate the prompt enriched with information from the vector database or the cache.**\n",
        "\n",
        "A semantic caching system aims to identify similar or identical user requests. When a matching request is found, the system retrieves the corresponding information from the cache, reducing the need to fetch it from the original source.\n",
        "\n",
        "As the comparison takes into account the semantic meaning of the requests, they don't have to be identical for the system to recognize them as the same question.  They can be formulated differently or contain inaccuracies, be they typographical or in the sentence structure, and we can identify that the user is actually requesting the same information.\n",
        "\n",
        "For instance, queries like **What is the capital of France?**, **Tell me the name of the capital of France?**, and **What The capital of France is?** all convey the same intent and should be identified as the same question.\n",
        "\n",
        "While the model's response may differ based on the request for a concise answer in the second example, the information retrieved from the vector database should be the same. This is why I'm placing the cache system between the user and the vector database, not between the user and the Large Language Model.\n",
        "\n",
        "\n",
        "<img src=\"https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/semantic_cache.jpg\">\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5gtBERjX1vFd"
      },
      "source": [
        "Most tutorials that guide you through creating a RAG system are designed for single-user use, meant to operate in a testing environment. In other words, within a notebook, interacting with a local vector database and making API calls or using a locally stored model.\n",
        "\n",
        "This architecture quickly becomes insufficient when attempting to transition one of these models to production, where they might encounter from tens to thousands of recurrent requests.\n",
        "\n",
        "One way to enhance performance is through one or multiple semantic caches. This cache retains the results of previous requests, and before resolving a new request, it checks if a similar one has been received before. If so, instead of re-executing the process, it retrieves the information from the cache.\n",
        "\n",
        "In a RAG system, there are two points that are time consuming:\n",
        "* Retrieve the information used to construct the enriched prompt:\n",
        "* Call the Large Language Model to obtain the response.\n",
        "\n",
        "In both points, a semantic cache system can be implemented, and we could even have two caches, one for each point.\n",
        "\n",
        "Placing it at the model's response point may lead to a loss of influence over the obtained response. Our cache system could consider \"Explain the French Revolution in 10 words\" and \"Explain the French Revolution in a hundred words\" as the same query. If our cache system stores model responses, users might think that their instructions are not being followed accurately.\n",
        "\n",
        "But both requests will require the same information to enrich the prompt. This is the main reason why I chose to place the semantic cache system between the user's request and the retrieval of information from the vector database.\n",
        "\n",
        "However, this is a design decision. Depending on the type of responses and system requests, it can be placed at one point or another. It's evident that caching model responses would yield the most time savings, but as I've already explained, it comes at the cost of losing user influence over the response.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uizxY8679TDz"
      },
      "source": [
        "# Import and load the libraries.\n",
        "To start we need to install the necesary Python packages.\n",
        "* **[sentence transformers](https://www.sbert.net/)**. This library is necessary to transform the sentences into fixed-length vectors, also know as embeddings.\n",
        "* **[xformers](https://github.com/facebookresearch/xformers)**. it's a package that provides libraries an utilities to facilitate the work with transformers models. We need to install in order to avoid an error when we work with the model and embeddings.  \n",
        "* **[chromadb](https://www.trychroma.com/)**. This is our vector Database. ChromaDB is easy to use and open source, maybe the most used Vector Database used to store embeddings.\n",
        "* **[accelerate](https://github.com/huggingface/accelerate)** Necesary to run the Model in a GPU.  "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:30:10.787688Z",
          "iopub.status.busy": "2024-02-29T17:30:10.787382Z",
          "iopub.status.idle": "2024-02-29T17:34:12.804579Z",
          "shell.execute_reply": "2024-02-29T17:34:12.80338Z",
          "shell.execute_reply.started": "2024-02-29T17:30:10.787657Z"
        },
        "id": "r1nUzd1u9TD0",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "!pip install -q transformers==4.38.1\n",
        "!pip install -q accelerate==0.27.2\n",
        "!pip install -q sentence-transformers==2.5.1\n",
        "!pip install -q xformers==0.0.24\n",
        "!pip install -q chromadb==0.4.24\n",
        "!pip install -q datasets==2.17.1"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:35:23.197598Z",
          "iopub.status.busy": "2024-02-29T17:35:23.197205Z",
          "iopub.status.idle": "2024-02-29T17:35:23.202259Z",
          "shell.execute_reply": "2024-02-29T17:35:23.201404Z",
          "shell.execute_reply.started": "2024-02-29T17:35:23.197556Z"
        },
        "id": "5jUwC_eE9TD0",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import pandas as pd"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9P-kYtc79TD1"
      },
      "source": [
        "# Load the Dataset\n",
        "As we are working in a free and limited space, and we can use just a few GB of memory I limited the number of rows to use from the Dataset with the variable `MAX_ROWS`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xZsN8yzUvfjN"
      },
      "outputs": [],
      "source": [
        "#Login to Hugging Face. It is mandatory to use the Gemma Model,\n",
        "#and recommended to acces public models and Datasets.\n",
        "from getpass import getpass\n",
        "if 'hf_key' not in locals():\n",
        "  hf_key = getpass(\"Your Hugging Face API Key: \")\n",
        "!huggingface-cli login --token $hf_key"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 47,
      "metadata": {
        "id": "9IVxu-uxtCTw"
      },
      "outputs": [],
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "data = load_dataset(\"keivalya/MedQuad-MedicalQnADataset\", split='train')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hmor-i1j9TD1"
      },
      "source": [
        "ChromaDB requires that the data has a unique identifier. We can make it with this statement, which will create a new column called **Id**.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 48,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 536
        },
        "id": "WbLf8c7_yHwy",
        "outputId": "492eac81-2f7b-4063-f444-405bf489d08e"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "summary": "{\n  \"name\": \"data\",\n  \"rows\": 16407,\n  \"fields\": [\n    {\n      \"column\": \"qtype\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 16,\n        \"samples\": [\n          \"susceptibility\",\n          \"symptoms\",\n          \"information\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Question\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 14979,\n        \"samples\": [\n          \"What are the symptoms of Danon disease ?\",\n          \"What is (are) Dowling-Degos disease ?\",\n          \"What are the genetic changes related to Pearson marrow-pancreas syndrome ?\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Answer\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 15817,\n        \"samples\": [\n          \"These resources address the diagnosis or management of glycogen storage disease type III:  - Gene Review: Gene Review: Glycogen Storage Disease Type III  - Genetic Testing Registry: Glycogen storage disease type III   These resources from MedlinePlus offer information about the diagnosis and management of various health conditions:  - Diagnostic Tests  - Drug Therapy  - Surgery and Rehabilitation  - Genetic Counseling   - Palliative Care\",\n          \"Diagnostic Challenges\\n  \\nFor doctors, diagnosing chronic fatigue syndrome (CFS) can be complicated by a number of factors:\\n  \\n   - There's no lab test or biomarker for CFS.\\n   - Fatigue and other symptoms of CFS are common to many illnesses.\\n   - For some CFS patients, it may not be obvious to doctors that they are ill.\\n   - The illness has a pattern of remission and relapse.\\n   - Symptoms vary from person to person in type, number, and severity.\\n  \\n  \\nThese factors have contributed to a low diagnosis rate. Of the one to four million Americans who have CFS, less than 20% have been diagnosed.\\n  Exams and Screening Tests for CFS\\n  \\nBecause there is no blood test, brain scan, or other lab test to diagnose CFS, the doctor should first rule out other possible causes.\\n  \\nIf a patient has had 6 or more consecutive months of severe fatigue that is reported to be unrelieved by sufficient bed rest and that is accompanied by nonspecific symptoms, including flu-like symptoms, generalized pain, and memory problems, the doctor should consider the possibility that the patient may have CFS. Further exams and tests are needed before a diagnosis can be made:\\n  \\n   - A detailed medical history will be needed and should include a review of medications that could be causing the fatigue and symptoms\\n   - A thorough physical and mental status examination will also be needed\\n   - A battery of laboratory screening tests will be needed to help identify or rule out other possible causes of the symptoms that could be treated\\n   - The doctor may also order additional tests to follow up on results of the initial screening tests\\n  \\n  \\nA CFS diagnosis requires that the patient has been fatigued for 6 months or more and has 4 of the 8 symptoms for CFS for 6 months or more. If, however, the patient has been fatigued for 6 months or more but does not have four of the eight symptoms, the diagnosis may be idiopathic fatigue.\\n  \\nThe complete process for diagnosing CFS can be found here.\\n  \\nAdditional information for healthcare professionals on use of tests can be found here.\",\n          \"Eating, diet, and nutrition have not been shown to play a role in causing or preventing simple kidney cysts.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"id\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 4736,\n        \"min\": 0,\n        \"max\": 16406,\n        \"num_unique_values\": 16407,\n        \"samples\": [\n          3634,\n          15104,\n          4395\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
              "type": "dataframe",
              "variable_name": "data"
            },
            "text/html": [
              "\n",
              "  <div id=\"df-e3cca7df-77db-4037-bb3f-d65b3ff8cbb0\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>qtype</th>\n",
              "      <th>Question</th>\n",
              "      <th>Answer</th>\n",
              "      <th>id</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>susceptibility</td>\n",
              "      <td>Who is at risk for Lymphocytic Choriomeningiti...</td>\n",
              "      <td>LCMV infections can occur after exposure to fr...</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>symptoms</td>\n",
              "      <td>What are the symptoms of Lymphocytic Choriomen...</td>\n",
              "      <td>LCMV is most commonly recognized as causing ne...</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>susceptibility</td>\n",
              "      <td>Who is at risk for Lymphocytic Choriomeningiti...</td>\n",
              "      <td>Individuals of all ages who come into contact ...</td>\n",
              "      <td>2</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>exams and tests</td>\n",
              "      <td>How to diagnose Lymphocytic Choriomeningitis (...</td>\n",
              "      <td>During the first phase of the disease, the mos...</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>treatment</td>\n",
              "      <td>What are the treatments for Lymphocytic Chorio...</td>\n",
              "      <td>Aseptic meningitis, encephalitis, or meningoen...</td>\n",
              "      <td>4</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>prevention</td>\n",
              "      <td>How to prevent Lymphocytic Choriomeningitis (L...</td>\n",
              "      <td>LCMV infection can be prevented by avoiding co...</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>information</td>\n",
              "      <td>What is (are) Parasites - Cysticercosis ?</td>\n",
              "      <td>Cysticercosis is an infection caused by the la...</td>\n",
              "      <td>6</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>susceptibility</td>\n",
              "      <td>Who is at risk for Parasites - Cysticercosis? ?</td>\n",
              "      <td>Cysticercosis is an infection caused by the la...</td>\n",
              "      <td>7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>exams and tests</td>\n",
              "      <td>How to diagnose Parasites - Cysticercosis ?</td>\n",
              "      <td>If you think that you may have cysticercosis, ...</td>\n",
              "      <td>8</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>treatment</td>\n",
              "      <td>What are the treatments for Parasites - Cystic...</td>\n",
              "      <td>Some people with cysticercosis do not need to ...</td>\n",
              "      <td>9</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e3cca7df-77db-4037-bb3f-d65b3ff8cbb0')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-e3cca7df-77db-4037-bb3f-d65b3ff8cbb0 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-e3cca7df-77db-4037-bb3f-d65b3ff8cbb0');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-8d88a5c2-4d94-419e-a3de-0292c6501384\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-8d88a5c2-4d94-419e-a3de-0292c6501384')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-8d88a5c2-4d94-419e-a3de-0292c6501384 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "text/plain": [
              "             qtype                                           Question  \\\n",
              "0   susceptibility  Who is at risk for Lymphocytic Choriomeningiti...   \n",
              "1         symptoms  What are the symptoms of Lymphocytic Choriomen...   \n",
              "2   susceptibility  Who is at risk for Lymphocytic Choriomeningiti...   \n",
              "3  exams and tests  How to diagnose Lymphocytic Choriomeningitis (...   \n",
              "4        treatment  What are the treatments for Lymphocytic Chorio...   \n",
              "5       prevention  How to prevent Lymphocytic Choriomeningitis (L...   \n",
              "6      information          What is (are) Parasites - Cysticercosis ?   \n",
              "7   susceptibility    Who is at risk for Parasites - Cysticercosis? ?   \n",
              "8  exams and tests        How to diagnose Parasites - Cysticercosis ?   \n",
              "9        treatment  What are the treatments for Parasites - Cystic...   \n",
              "\n",
              "                                              Answer  id  \n",
              "0  LCMV infections can occur after exposure to fr...   0  \n",
              "1  LCMV is most commonly recognized as causing ne...   1  \n",
              "2  Individuals of all ages who come into contact ...   2  \n",
              "3  During the first phase of the disease, the mos...   3  \n",
              "4  Aseptic meningitis, encephalitis, or meningoen...   4  \n",
              "5  LCMV infection can be prevented by avoiding co...   5  \n",
              "6  Cysticercosis is an infection caused by the la...   6  \n",
              "7  Cysticercosis is an infection caused by the la...   7  \n",
              "8  If you think that you may have cysticercosis, ...   8  \n",
              "9  Some people with cysticercosis do not need to ...   9  "
            ]
          },
          "execution_count": 48,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "data = data.to_pandas()\n",
        "data[\"id\"]=data.index\n",
        "data.head(10)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:35:25.528374Z",
          "iopub.status.busy": "2024-02-29T17:35:25.527688Z",
          "iopub.status.idle": "2024-02-29T17:35:25.709895Z",
          "shell.execute_reply": "2024-02-29T17:35:25.709127Z",
          "shell.execute_reply.started": "2024-02-29T17:35:25.528341Z"
        },
        "id": "DZf0zCI29TD1",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "MAX_ROWS = 15000\n",
        "DOCUMENT=\"Answer\"\n",
        "TOPIC=\"qtype\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:35:29.184342Z",
          "iopub.status.busy": "2024-02-29T17:35:29.183979Z",
          "iopub.status.idle": "2024-02-29T17:35:29.189229Z",
          "shell.execute_reply": "2024-02-29T17:35:29.1881Z",
          "shell.execute_reply.started": "2024-02-29T17:35:29.184313Z"
        },
        "id": "Mkoj9IrZ9TD1",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "#Because it is just a sample we select a small portion of News.\n",
        "subset_data = data.head(MAX_ROWS)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rZHg_Qh69TD1"
      },
      "source": [
        "# Import and configure the Vector Database\n",
        "To store the information, I've chosen to use ChromaDB, one of the most well-known and widely used open-source vector databases.\n",
        "\n",
        "First we need to import ChromaDB."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:35:31.849551Z",
          "iopub.status.busy": "2024-02-29T17:35:31.849199Z",
          "iopub.status.idle": "2024-02-29T17:35:32.31736Z",
          "shell.execute_reply": "2024-02-29T17:35:32.316617Z",
          "shell.execute_reply.started": "2024-02-29T17:35:31.849525Z"
        },
        "id": "npJhuZQw9TD1",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "import chromadb"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8okox5C89TD1"
      },
      "source": [
        "Now we only need to indicate the path where the vector database will be stored."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:35:34.410646Z",
          "iopub.status.busy": "2024-02-29T17:35:34.410268Z",
          "iopub.status.idle": "2024-02-29T17:35:34.872817Z",
          "shell.execute_reply": "2024-02-29T17:35:34.872039Z",
          "shell.execute_reply.started": "2024-02-29T17:35:34.410614Z"
        },
        "id": "9yK6y0hm9TD1",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "chroma_client = chromadb.PersistentClient(path=\"/path/to/persist/directory\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7MhMwk3J9TD1"
      },
      "source": [
        "# Filling and Querying the ChromaDB Database\n",
        "The Data in ChromaDB is stored in collections. If the collection exist we need to delete it.\n",
        "\n",
        "In the next lines, we are creating the collection by calling the `create_collection` function in the `chroma_client` created above."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:35:36.116012Z",
          "iopub.status.busy": "2024-02-29T17:35:36.1156Z",
          "iopub.status.idle": "2024-02-29T17:35:36.16922Z",
          "shell.execute_reply": "2024-02-29T17:35:36.168504Z",
          "shell.execute_reply.started": "2024-02-29T17:35:36.115977Z"
        },
        "id": "kRCsunE19TD1",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "collection_name = \"news_collection\"\n",
        "if len(chroma_client.list_collections()) > 0 and collection_name in [chroma_client.list_collections()[0].name]:\n",
        "    chroma_client.delete_collection(name=collection_name)\n",
        "\n",
        "collection = chroma_client.create_collection(name=collection_name)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rdEtcETr9TD2"
      },
      "source": [
        "We are now ready to add the data to the collection using the `add` function. This function requires three key pieces of information:\n",
        "\n",
        "* In the **document** we store the content of the `Answer` column in the Dataset.\n",
        "* In **metadatas**, we can inform a list of topics. I used the value in the column `qtype`.\n",
        "* In **id** we need to inform an unique identificator for each row. I'm creating the ID using the range of `MAX_ROWS`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2024-02-29T17:35:38.051601Z",
          "iopub.status.busy": "2024-02-29T17:35:38.051179Z",
          "iopub.status.idle": "2024-02-29T17:36:38.612836Z",
          "shell.execute_reply": "2024-02-29T17:36:38.611814Z",
          "shell.execute_reply.started": "2024-02-29T17:35:38.051569Z"
        },
        "id": "4dDoqJE79TD2",
        "outputId": "36f579dc-ec60-48b1-807a-1e68113cc9f4",
        "trusted": true
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/root/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:01<00:00, 68.1MiB/s]\n"
          ]
        }
      ],
      "source": [
        "collection.add(\n",
        "    documents=subset_data[DOCUMENT].tolist(),\n",
        "    metadatas=[{TOPIC: topic} for topic in subset_data[TOPIC].tolist()],\n",
        "    ids=[f\"id{x}\" for x in range(MAX_ROWS)],\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "du6-iuUisRkM"
      },
      "source": [
        "Once we have the information in the Database we can query it, and ask for data that matches our needs. The search is done inside the content of the document, and it dosn't look for the exact word, or phrase. The results will be based on the similarity between the search terms and the content of documents.\n",
        "\n",
        "Metadata isn't directly involved in the initial search process, it can be used to filter or refine the results after retrieval, enabling further customization and precision.\n",
        "\n",
        "Let's define a function to query the ChromaDB Database."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:36:38.616047Z",
          "iopub.status.busy": "2024-02-29T17:36:38.615302Z",
          "iopub.status.idle": "2024-02-29T17:36:38.620516Z",
          "shell.execute_reply": "2024-02-29T17:36:38.619561Z",
          "shell.execute_reply.started": "2024-02-29T17:36:38.616008Z"
        },
        "id": "UjdhZ4MJ9TD2",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "def query_database(query_text, n_results=10):\n",
        "    results = collection.query(query_texts=query_text, n_results=n_results )\n",
        "    return results"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CL0Crl3x9TD2"
      },
      "source": [
        "## Creating the semantic cache system\n",
        "To implement the cache system, we will use Faiss, a library that allows storing embeddings in memory. It's quite similar to what Chroma does, but without its persistence.\n",
        "\n",
        "For this purpose, we will create a class called `semantic_cache` that will work with its own encoder and provide the necessary functions for the user to perform queries.\n",
        "\n",
        "In this class, we first query the cache implemented with Faiss, that contains the previous petitions, and if the returned results are above a specified threshold, it will return the content of the cache. Otherwise, it will fetch the result from the Chroma database.\n",
        "\n",
        "The cache is stored in a .json file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:36:38.621968Z",
          "iopub.status.busy": "2024-02-29T17:36:38.621655Z",
          "iopub.status.idle": "2024-02-29T17:36:51.313356Z",
          "shell.execute_reply": "2024-02-29T17:36:51.312232Z",
          "shell.execute_reply.started": "2024-02-29T17:36:38.621936Z"
        },
        "id": "6OzUbRUe9TD2",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "!pip install -q faiss-cpu==1.8.0"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "id": "0yGE4cTEp3QJ"
      },
      "outputs": [],
      "source": [
        "import faiss\n",
        "from sentence_transformers import SentenceTransformer\n",
        "import time\n",
        "import json"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yi_riXHhcLy0"
      },
      "source": [
        "The `init_cache()` function below initializes the semantic cache.\n",
        "\n",
        "It employs the FlatLS index, which might not be the fastest but is ideal for small datasets. Depending on the characteristics of the data intended for the cache and the expected dataset size, another index such as HNSW or IVF could be utilized.\n",
        "\n",
        "I chose this index because it aligns well with the example. It can be used with vectors of high dimensions, consumes minimal memory, and performs well with small datasets.\n",
        "\n",
        "I outline the key features of the various indices available with Faiss.\n",
        "\n",
        "* FlatL2 or FlatIP. Well-suited for small datasets, it may not be the fastest, but its memory consumption is not excessive.\n",
        "* LSH. It works effectively with small datasets and is recommended for use with vectors of up to 128 dimensions.\n",
        "* HNSW. Very fast but demands a substantial amount of RAM.\n",
        "* IVF. Works well with large datasets without consuming much memory or compromising performance.\n",
        "\n",
        "More information about the different indices available with Faiss can be found at this link: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "id": "9poNBxbPl7xE"
      },
      "outputs": [],
      "source": [
        "def init_cache():\n",
        "  index = faiss.IndexFlatL2(768)\n",
        "  if index.is_trained:\n",
        "    print('Index trained')\n",
        "\n",
        "  # Initialize Sentence Transformer model\n",
        "  encoder = SentenceTransformer('all-mpnet-base-v2')\n",
        "\n",
        "  return index, encoder"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_uZzX60odo1U"
      },
      "source": [
        "In the `retrieve_cache` function, the .json file is retrieved from disk in case there is a need to reuse the cache across sessions."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "id": "FDJJ86TSp5CO"
      },
      "outputs": [],
      "source": [
        "def retrieve_cache(json_file):\n",
        "  try:\n",
        "    with open(json_file, 'r') as file:\n",
        "      cache = json.load(file)\n",
        "  except FileNotFoundError:\n",
        "      cache = {'questions': [], 'embeddings': [], 'answers': [], 'response_text': []}\n",
        "\n",
        "  return cache"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3uO-12UIdtSD"
      },
      "source": [
        "The `store_cache` function saves the file containing the cache data to disk."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "id": "jx1CiKOcwKGn"
      },
      "outputs": [],
      "source": [
        "def store_cache(json_file, cache):\n",
        "  with open(json_file, 'w') as file:\n",
        "    json.dump(cache, file)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "t9AdmnhQd2E8"
      },
      "source": [
        "These functions will be used within the `SemanticCache` class, which includes the search function and its initialization function.\n",
        "\n",
        "Even though the `ask` function has a substantial amount of code, its purpose is quite straightforward. It looks in the cache for the closest question to the one just made by the user.\n",
        "\n",
        "Afterward, checks if it is within the specified threshold. If positive, it directly returns the response from the cache; otherwise, it calls the `query_database` function to retrieve the data from ChromaDB.\n",
        "\n",
        "I've used Euclidean distance instead of Cosine, which is widely employed in vector comparisons. This choice is based on the fact that Euclidean distance is the default metric used by Faiss. Although Cosine distance can also be calculated, doing so adds complexity that may not significantly contribute to the final result.\n",
        "\n",
        "I have included FIFO eviction policy in the semantic_cache class, which aims to improve its efficiency and flexibility. By introducing eviction policies, we provide users with the ability to control how the cache behaves when it reaches its maximum capacity. This is crucial for maintaining optimal cache performance and for handling situations where the available memory is constrained. \n",
        "\n",
        "Looking at the structure of the cache, the implementation of FIFO seemed straightforward. Whenever a new question-answer pair is added to the cache, it's appended to the end of the lists. Thus, the oldest (first-in) items are at the front of the lists. When the cache reaches its maximum size and you need to evict an item, you remove (pop) the first item from each list. This is the FIFO eviction policy. \n",
        "\n",
        "\n",
        "Another eviction policy is the Least Recently Used (LRU) policy, which is more complex because it requires knowledge of when each item in the cache was last accessed. However, this policy is not yet available and will be implemented later.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 51,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:36:51.31678Z",
          "iopub.status.busy": "2024-02-29T17:36:51.316449Z",
          "iopub.status.idle": "2024-02-29T17:36:55.197427Z",
          "shell.execute_reply": "2024-02-29T17:36:55.196616Z",
          "shell.execute_reply.started": "2024-02-29T17:36:51.316746Z"
        },
        "id": "t_HVtwww9TD2",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "class semantic_cache:\n",
        "  def __init__(self, json_file=\"cache_file.json\", thresold=0.35, max_response=100, eviction_policy=None):\n",
        "    \"\"\"Initializes the semantic cache.\n",
        "\n",
        "    Args:\n",
        "    json_file (str): The name of the JSON file where the cache is stored.\n",
        "    thresold (float): The threshold for the Euclidean distance to determine if a question is similar.\n",
        "    max_response (int): The maximum number of responses the cache can store.\n",
        "    eviction_policy (str): The policy for evicting items from the cache. \n",
        "                            This can be any policy, but 'FIFO' (First In First Out) has been implemented for now.\n",
        "                            If None, no eviction policy will be applied.\n",
        "    \"\"\"\n",
        "       \n",
        "    # Initialize Faiss index with Euclidean distance\n",
        "    self.index, self.encoder = init_cache()\n",
        "\n",
        "    # Set Euclidean distance threshold\n",
        "    # a distance of 0 means identicals sentences\n",
        "    # We only return from cache sentences under this thresold\n",
        "    self.euclidean_threshold = thresold\n",
        "\n",
        "    self.json_file = json_file\n",
        "    self.cache = retrieve_cache(self.json_file)\n",
        "    self.max_response = max_response\n",
        "    self.eviction_policy = eviction_policy\n",
        "\n",
        "  def evict(self):\n",
        "\n",
        "    \"\"\"Evicts an item from the cache based on the eviction policy.\"\"\"\n",
        "    if self.eviction_policy and len(self.cache[\"questions\"]) > self.max_size:\n",
        "        for _ in range((len(self.cache[\"questions\"]) - self.max_response)):\n",
        "            if self.eviction_policy == 'FIFO':\n",
        "                self.cache[\"questions\"].pop(0)\n",
        "                self.cache[\"embeddings\"].pop(0)\n",
        "                self.cache[\"answers\"].pop(0)\n",
        "                self.cache[\"response_text\"].pop(0)\n",
        "\n",
        "  def ask(self, question: str) -> str:\n",
        "      # Method to retrieve an answer from the cache or generate a new one\n",
        "      start_time = time.time()\n",
        "      try:\n",
        "          #First we obtain the embeddings corresponding to the user question\n",
        "          embedding = self.encoder.encode([question])\n",
        "\n",
        "          # Search for the nearest neighbor in the index\n",
        "          self.index.nprobe = 8\n",
        "          D, I = self.index.search(embedding, 1)\n",
        "\n",
        "          if D[0] >= 0:\n",
        "              if I[0][0] >= 0 and D[0][0] <= self.euclidean_threshold:\n",
        "                  row_id = int(I[0][0])\n",
        "\n",
        "                  print('Answer recovered from Cache. ')\n",
        "                  print(f'{D[0][0]:.3f} smaller than {self.euclidean_threshold}')\n",
        "                  print(f'Found cache in row: {row_id} with score {D[0][0]:.3f}')\n",
        "                  print(f'response_text: ' + self.cache['response_text'][row_id])\n",
        "\n",
        "                  end_time = time.time()\n",
        "                  elapsed_time = end_time - start_time\n",
        "                  print(f\"Time taken: {elapsed_time:.3f} seconds\")\n",
        "                  return self.cache['response_text'][row_id]\n",
        "\n",
        "          # Handle the case when there are not enough results\n",
        "          # or Euclidean distance is not met, asking to chromaDB.\n",
        "          answer  = query_database([question], 1)\n",
        "          response_text = answer['documents'][0][0]\n",
        "\n",
        "          self.cache['questions'].append(question)\n",
        "          self.cache['embeddings'].append(embedding[0].tolist())\n",
        "          self.cache['answers'].append(answer)\n",
        "          self.cache['response_text'].append(response_text)\n",
        "\n",
        "          print('Answer recovered from ChromaDB. ')\n",
        "          print(f'response_text: {response_text}')\n",
        "\n",
        "          self.index.add(embedding)\n",
        "\n",
        "          self.evict()\n",
        "\n",
        "          store_cache(self.json_file, self.cache)\n",
        "          \n",
        "          end_time = time.time()\n",
        "          elapsed_time = end_time - start_time\n",
        "          print(f\"Time taken: {elapsed_time:.3f} seconds\")\n",
        "\n",
        "          return response_text\n",
        "      except Exception as e:\n",
        "          raise RuntimeError(f\"Error during 'ask' method: {e}\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UBWTqGM7i71N"
      },
      "source": [
        "### Testing the semantic_cache class."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 52,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "JH8s8eUtCMIS",
        "outputId": "c613bbfc-9f84-4a96-cd39-45972e69c15b"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Index trained\n"
          ]
        }
      ],
      "source": [
        "# Initialize the cache.\n",
        "cache = semantic_cache('4cache.json')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 53,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "mKqKLfDe_8bC",
        "outputId": "8a92ed95-c822-4382-c6db-d9de289341af"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Answer recovered from ChromaDB. \n",
            "response_text: Summary : Shots may hurt a little, but the diseases they can prevent are a lot worse. Some are even life-threatening. Immunization shots, or vaccinations, are essential. They protect against things like measles, mumps, rubella, hepatitis B, polio, tetanus, diphtheria, and pertussis (whooping cough). Immunizations are important for adults as well as children.    Your immune system helps your body fight germs by producing substances to combat them. Once it does, the immune system \"remembers\" the germ and can fight it again. Vaccines contain germs that have been killed or weakened. When given to a healthy person, the vaccine triggers the immune system to respond and thus build immunity.     Before vaccines, people became immune only by actually getting a disease and surviving it. Immunizations are an easier and less risky way to become immune.     NIH: National Institute of Allergy and Infectious Diseases\n",
            "Time taken: 0.057 seconds\n"
          ]
        }
      ],
      "source": [
        "results = cache.ask(\"How do vaccines work?\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dP7H6TypknLN"
      },
      "source": [
        "As expected, this response has been obtained from ChromaDB. The class then stores it in the cache.\n",
        "\n",
        "Now, if we send a second question that is quite different, the response should also be retrieved from ChromaDB. This is because the question stored previously is so dissimilar that it would surpass the specified threshold in terms of Euclidean distance."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 54,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2024-02-29T17:37:15.335593Z",
          "iopub.status.busy": "2024-02-29T17:37:15.335288Z",
          "iopub.status.idle": "2024-02-29T17:37:17.320691Z",
          "shell.execute_reply": "2024-02-29T17:37:17.319671Z",
          "shell.execute_reply.started": "2024-02-29T17:37:15.335566Z"
        },
        "id": "CvJykqVf9TD2",
        "outputId": "7137919e-e417-47b3-a638-18026b3edfe6",
        "trusted": true
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Answer recovered from ChromaDB. \n",
            "response_text: Sydenham chorea (SD) is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS), the bacterium that causes rheumatic fever. SD is characterized by rapid, irregular, and aimless involuntary movements of the arms and legs, trunk, and facial muscles. It affects girls more often than boys and typically occurs between 5 and 15 years of age. Some children will have a sore throat several weeks before the symptoms begin, but the disorder can also strike up to 6 months after the fever or infection has cleared. Symptoms can appear gradually or all at once, and also may include uncoordinated movements, muscular weakness, stumbling and falling, slurred speech, difficulty concentrating and writing, and emotional instability. The symptoms of SD can vary from a halting gait and slight grimacing to involuntary movements that are frequent and severe enough to be incapacitating. The random, writhing movements of chorea are caused by an auto-immune reaction to the bacterium that interferes with the normal function of a part of the brain (the basal ganglia) that controls motor movements. Due to better sanitary conditions and the use of antibiotics to treat streptococcal infections, rheumatic fever, and consequently SD, are rare in North America and Europe. The disease can still be found in developing nations.\n",
            "Time taken: 0.082 seconds\n"
          ]
        }
      ],
      "source": [
        "\n",
        "results = cache.ask(\"Explain briefly what is a Sydenham chorea\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8aPWvU64lxOU"
      },
      "source": [
        "Perfect, the semantic cache system is behaving as expected.\n",
        "\n",
        "Let's proceed to test it with a question very similar to the one we just asked.\n",
        "\n",
        "In this case, the response should come directly from the cache without the need to access the ChromaDB database."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sPmmTGGM0pVj"
      },
      "source": []
    },
    {
      "cell_type": "code",
      "execution_count": 55,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2024-02-29T17:37:17.328926Z",
          "iopub.status.busy": "2024-02-29T17:37:17.32865Z",
          "iopub.status.idle": "2024-02-29T17:37:17.463363Z",
          "shell.execute_reply": "2024-02-29T17:37:17.462397Z",
          "shell.execute_reply.started": "2024-02-29T17:37:17.328902Z"
        },
        "id": "9_5IcGB-9TD2",
        "outputId": "13563a7d-01f7-47d1-c345-6ad128f303c3",
        "trusted": true
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Answer recovered from Cache. \n",
            "0.028 smaller than 0.35\n",
            "Found cache in row: 1 with score 0.028\n",
            "response_text: Sydenham chorea (SD) is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS), the bacterium that causes rheumatic fever. SD is characterized by rapid, irregular, and aimless involuntary movements of the arms and legs, trunk, and facial muscles. It affects girls more often than boys and typically occurs between 5 and 15 years of age. Some children will have a sore throat several weeks before the symptoms begin, but the disorder can also strike up to 6 months after the fever or infection has cleared. Symptoms can appear gradually or all at once, and also may include uncoordinated movements, muscular weakness, stumbling and falling, slurred speech, difficulty concentrating and writing, and emotional instability. The symptoms of SD can vary from a halting gait and slight grimacing to involuntary movements that are frequent and severe enough to be incapacitating. The random, writhing movements of chorea are caused by an auto-immune reaction to the bacterium that interferes with the normal function of a part of the brain (the basal ganglia) that controls motor movements. Due to better sanitary conditions and the use of antibiotics to treat streptococcal infections, rheumatic fever, and consequently SD, are rare in North America and Europe. The disease can still be found in developing nations.\n",
            "Time taken: 0.019 seconds\n"
          ]
        }
      ],
      "source": [
        "results = cache.ask(\"Briefly explain me what is a Sydenham chorea.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "M4H8RoXFqdwE"
      },
      "source": [
        "The two questions are so similar that their Euclidean distance is truly minimal, almost as if they were identical.\n",
        "\n",
        "Now, let's try another question, this time a bit more distinct, and observe how the system behaves."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 56,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ysj5P_MBCqju",
        "outputId": "d4639f73-dc7e-4c25-93ba-2a8c66dc7c61"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Answer recovered from Cache. \n",
            "0.228 smaller than 0.35\n",
            "Found cache in row: 1 with score 0.228\n",
            "response_text: Sydenham chorea (SD) is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS), the bacterium that causes rheumatic fever. SD is characterized by rapid, irregular, and aimless involuntary movements of the arms and legs, trunk, and facial muscles. It affects girls more often than boys and typically occurs between 5 and 15 years of age. Some children will have a sore throat several weeks before the symptoms begin, but the disorder can also strike up to 6 months after the fever or infection has cleared. Symptoms can appear gradually or all at once, and also may include uncoordinated movements, muscular weakness, stumbling and falling, slurred speech, difficulty concentrating and writing, and emotional instability. The symptoms of SD can vary from a halting gait and slight grimacing to involuntary movements that are frequent and severe enough to be incapacitating. The random, writhing movements of chorea are caused by an auto-immune reaction to the bacterium that interferes with the normal function of a part of the brain (the basal ganglia) that controls motor movements. Due to better sanitary conditions and the use of antibiotics to treat streptococcal infections, rheumatic fever, and consequently SD, are rare in North America and Europe. The disease can still be found in developing nations.\n",
            "Time taken: 0.016 seconds\n"
          ]
        }
      ],
      "source": [
        "question_def = \"Write in 20 words what is a Sydenham chorea.\"\n",
        "results = cache.ask(question_def)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MFzXsQwB9TD3"
      },
      "source": [
        "We observe that the Euclidean distance has increased, but it still remains within the specified threshold. Therefore, it continues to return the response directly from the cache."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ot3wrq0p9TD3"
      },
      "source": [
        "# Loading the model and creating the prompt\n",
        "Time to use the library **transformers**, the most famous library from [hugging face](https://huggingface.co/) for working with language models.\n",
        "\n",
        "We are importing:\n",
        "* **Autotokenizer**: It is a utility class for tokenizing text inputs that are compatible with various pre-trained language models.\n",
        "* **AutoModelForCausalLM**: it provides an interface to pre-trained language models specifically designed for language generation tasks using causal language modeling (e.g., GPT models), or the model used in this notebook [Gemma-2b-it](https://huggingface.co/google/gemma-2b-it).\n",
        "\n",
        "Please, feel free to test [different Models](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending), you need to search for NLP models trained for text-generation.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:40:32.797669Z",
          "iopub.status.busy": "2024-02-29T17:40:32.797334Z",
          "iopub.status.idle": "2024-02-29T17:40:44.152114Z",
          "shell.execute_reply": "2024-02-29T17:40:44.151056Z",
          "shell.execute_reply.started": "2024-02-29T17:40:32.797635Z"
        },
        "id": "tdxiKqjT9TD3",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "!pip install torch"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:40:44.15434Z",
          "iopub.status.busy": "2024-02-29T17:40:44.153914Z",
          "iopub.status.idle": "2024-02-29T17:40:44.160144Z",
          "shell.execute_reply": "2024-02-29T17:40:44.159154Z",
          "shell.execute_reply.started": "2024-02-29T17:40:44.154292Z"
        },
        "id": "pIDMTCnH9TD7",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "from torch import cuda, torch\n",
        "#In a MAC Silicon the device must be 'mps'\n",
        "# device = torch.device('mps') #to use with MAC Silicon\n",
        "device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-02-29T17:41:25.628804Z",
          "iopub.status.busy": "2024-02-29T17:41:25.628412Z",
          "iopub.status.idle": "2024-02-29T17:41:30.202141Z",
          "shell.execute_reply": "2024-02-29T17:41:30.200774Z",
          "shell.execute_reply.started": "2024-02-29T17:41:25.628766Z"
        },
        "id": "CU2T4lp-9TD7",
        "trusted": true
      },
      "outputs": [],
      "source": [
        "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
        "\n",
        "model_id = \"google/gemma-2b-it\"\n",
        "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
        "model = AutoModelForCausalLM.from_pretrained(model_id,\n",
        "                                             device_map=\"cuda\",\n",
        "                                            torch_dtype=torch.bfloat16)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0kdqsEbUEywG"
      },
      "source": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GzHuFrAX9TD7"
      },
      "source": [
        "## Creating the extended prompt\n",
        "To create the prompt we use the result from query the 'semantic_cache' class  and the question introduced by the user.\n",
        "\n",
        "The prompt have two parts, the **relevant context** that is the information recovered from the database and the **user's question**.\n",
        "\n",
        "We only need to put the two parts together to create the prompt then send it to the model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 44,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 209
        },
        "id": "TdjbfAHhFuhS",
        "outputId": "4090da66-328e-478e-c2d7-1957597f8786"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "\"Relevant context: Sydenham chorea (SD) is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS), the bacterium that causes rheumatic fever. SD is characterized by rapid, irregular, and aimless involuntary movements of the arms and legs, trunk, and facial muscles. It affects girls more often than boys and typically occurs between 5 and 15 years of age. Some children will have a sore throat several weeks before the symptoms begin, but the disorder can also strike up to 6 months after the fever or infection has cleared. Symptoms can appear gradually or all at once, and also may include uncoordinated movements, muscular weakness, stumbling and falling, slurred speech, difficulty concentrating and writing, and emotional instability. The symptoms of SD can vary from a halting gait and slight grimacing to involuntary movements that are frequent and severe enough to be incapacitating. The random, writhing movements of chorea are caused by an auto-immune reaction to the bacterium that interferes with the normal function of a part of the brain (the basal ganglia) that controls motor movements. Due to better sanitary conditions and the use of antibiotics to treat streptococcal infections, rheumatic fever, and consequently SD, are rare in North America and Europe. The disease can still be found in developing nations.\\n\\n The user's question: Write in 20 words what is a Sydenham chorea.\""
            ]
          },
          "execution_count": 44,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "prompt_template = f\"Relevant context: {results}\\n\\n The user's question: {question_def}\"\n",
        "prompt_template"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 45,
      "metadata": {
        "id": "DmYAcXEEECnz"
      },
      "outputs": [],
      "source": [
        "input_ids = tokenizer(prompt_template, return_tensors=\"pt\").to(\"cuda\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "S-QXeuJ09TD8"
      },
      "source": [
        "Now all that remains is to send the prompt to the model and wait for its response!\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 46,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "lheL8vHpEMDD",
        "outputId": "b646d648-b88d-4a29-ab30-427d00296255"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "<bos>Relevant context: Sydenham chorea (SD) is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS), the bacterium that causes rheumatic fever. SD is characterized by rapid, irregular, and aimless involuntary movements of the arms and legs, trunk, and facial muscles. It affects girls more often than boys and typically occurs between 5 and 15 years of age. Some children will have a sore throat several weeks before the symptoms begin, but the disorder can also strike up to 6 months after the fever or infection has cleared. Symptoms can appear gradually or all at once, and also may include uncoordinated movements, muscular weakness, stumbling and falling, slurred speech, difficulty concentrating and writing, and emotional instability. The symptoms of SD can vary from a halting gait and slight grimacing to involuntary movements that are frequent and severe enough to be incapacitating. The random, writhing movements of chorea are caused by an auto-immune reaction to the bacterium that interferes with the normal function of a part of the brain (the basal ganglia) that controls motor movements. Due to better sanitary conditions and the use of antibiotics to treat streptococcal infections, rheumatic fever, and consequently SD, are rare in North America and Europe. The disease can still be found in developing nations.\n",
            "\n",
            " The user's question: Write in 20 words what is a Sydenham chorea.\n",
            "\n",
            "Sure, here is a 20-word answer:\n",
            "\n",
            "Sydenham chorea is a neurological disorder of childhood resulting from infection via Group A beta-hemolytic streptococcus (GABHS).<eos>\n"
          ]
        }
      ],
      "source": [
        "outputs = model.generate(**input_ids,\n",
        "                         max_new_tokens=256)\n",
        "print(tokenizer.decode(outputs[0]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "execution": {
          "iopub.execute_input": "2023-07-12T22:01:56.993351Z",
          "iopub.status.busy": "2023-07-12T22:01:56.992775Z",
          "iopub.status.idle": "2023-07-12T22:01:57.001309Z",
          "shell.execute_reply": "2023-07-12T22:01:56.999431Z",
          "shell.execute_reply.started": "2023-07-12T22:01:56.993305Z"
        },
        "id": "Uo7lGXBV9TD8"
      },
      "source": [
        "# Conclusion.\n",
        "There's a 50% reduction in data retrieval time between accessing ChromaDB and going directly to the cache. However, in larger projects, this difference increases, leading to enhancements of 90-95%.\n",
        "\n",
        "We have very few data in Chroma, and only a single instance of the cache class. Typically, the data behind the cache system is much larger, possibly involving more than just a query to a vector database but sourced from various places.\n",
        "\n",
        "It's common to have multiple instances of the cache class, usually based on user typology, as questions tend to repeat more among users who share common traits.\n",
        "\n",
        "In summary, we have created a very simple RAG (Retrieval-Augmented Generation) system and enhanced it with a semantic cache layer between the user's question and obtaining the information necessary to create the enriched prompt."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "machine_shape": "hm",
      "provenance": []
    },
    "kaggle": {
      "accelerator": "gpu",
      "dataSources": [
        {
          "datasetId": 3496946,
          "sourceId": 6104553,
          "sourceType": "datasetVersion"
        }
      ],
      "dockerImageVersionId": 30527,
      "isGpuEnabled": true,
      "isInternetEnabled": true,
      "language": "python",
      "sourceType": "notebook"
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
